This tutorial is designed for R users that already have a basic
understanding of R and it’s functions and would like to improve their
data wrangling and visualisation skills, but don’t be scared, the
functions and methods are not super complex. Whether you are a Data/BI
Analyst, programmer, student or just a data enthusiast, as long as you
already gained a bit of confidence with R, this tutorial is for you!
Spoiler - this is not another US election dataset tutorial!
As you know, data by itself is just a combination of numbers and characters and the purpose of analysing and visualising it is to tell a good story, but to tell a good story, you need interesting data.
Back when I was a student, I remember learning R using the same sort of data every single time. US elections by state, GDP per capita, average household, etc. - always the same…Well, the reason I wrote this tutorial is because I think there are just not enough tutorials with interesting data out there. So no, we will not go through the same old databases, instead this tutorial will teach you some R skills and will expose you to a story you have probably never heard of.
The tutorial makes use of catch data from a Shark Control Program in
the State of Queensland, Australia. In short, this is a governmental
program that is trying to reduce shark risk in the shores of Queensland
(state in Australia) using nets and drumlines (which are some sort of a
baited hook). Want to learn more? have a look at this website I built
for more information.
The data that stands in the base of this analysis was downloaded from QFISH, an online data portal of the State of Queensland, Department of Agriculture and Fisheries. The data is publicly available under a Creative Commons Attribution 3.0 Australia (CC BY) licence.
Downloading the data in a format that contains all the information we want is a bit tricky. To make things easier, I gathered all the files in a zip file which you can download —here—. Otherwise, you need to go to QFISH and run custom data queries on the site for each year. To get exactly the same tables, use the following parameters:
Here are most of the libraries we will be using in this tutorial:
tidyverse,readxl,data.table,lubridate,janitor
I assume that you already know what R libraries are and how to install and load libraries into your R environment, in case you forgot:
Installing:
#Installing Packages
install.packages('tidyverse')
install.packages('readxl')
install.packages('data.table')
install.packages('lubridate')
install.packages('janitor')
install.packages('ggpubr')Loading:
library('tidyverse')
library('readxl')
library('data.table')
library('lubridate')
library('janitor')
library('ggpubr')This is the simplest way to install and load R libraries but in my view it is a bit clunky if using multiple R packages. In our case we only use 5 libraries, but what if you need 10 packages or more? What if they are already installed? what if you are writing a code used by different computers?
For these reasons my preferred way to load libraries is by listing them all as a variable, and then loading them with a simple function. If the library is not already installed, the function will automatically install it for us:
# Create a list of all the packages we need
packages<-c('tidyverse','readxl','data.table','lubridate','janitor','ggpubr')
# Create a variable that lists all the installed packages
installed_packages <- packages %in% rownames(installed.packages())
# Check which of the packages are already installed install them if they are not
if (any(installed_packages == FALSE)) {
install.packages(packages[!installed_packages])
}
# Load all the libraries
invisible(lapply(packages, library, character.only = TRUE))
#Remove the variables we created
rm(installed_packages,packages)
Throughout this tutorial you will be able to see that I generally use some base R commands but mostly use dplyr commands (tidyverse library) which makes use of the pipe operator (%>%). If you haven’t used dplyr before, I recommend you to practice it before doing this tutorial and also in general, it is commonly used among R users and is a crucial skill if you want to master programming with R.
I always find it easier to have a quick look at the data first to understand the data structure. Maybe the csv contains a blank row at the top? Are there any parameters we need to take into account when loading the data to R?
Open the file with your preferred software, it can be MS excel, Google Sheets, Notepad, etc. What is the most efficient way to load all the data into one dataframe in R ? Can you find any repetitive patterns? Can we develop a code that will allow us to automate this process for perhaps future updates?
If you had a look at one, two, three or all the files you must have noticed the following:
As I mentioned, I always prefer to write plug and play code - i.e. a
code that can be used by any machine with minimal inputs.
Theoretically, we could have a code that downloads the data and saves
it, but we would still need to specify where we want to save the data.
What I usually do when working via a desktop environment, is set my
default directory as the same folder where my R project is saved, and
save all related data in the same folder - in our case the “Raw Data”
folder.
To set you directory to the same folder as the R project file:
# default directory
setwd(dirname(rstudioapi::getActiveDocumentContext()$path))Then we want read all the files into one dataframe. To do so I use the list.files function. The first argument of the function specifies the source (where should the function look for the files), and the second argument specifies the pattern we want to find (what). When setting the pattern to “.csv” the function will look for all the csv files in the folder we specified. It is important to set full.names as TRUE so that we list the full directory path for each file.
# List all csv files in the folder
# If you saved the data somewhere else, just edit the path below
list.files('Raw Data/New Source (Qfish)',pattern = ".csv",full.names = TRUE)## [1] "Raw Data/New Source (Qfish)/2001.csv"
## [2] "Raw Data/New Source (Qfish)/2002.csv"
## [3] "Raw Data/New Source (Qfish)/2003.csv"
## [4] "Raw Data/New Source (Qfish)/2004.csv"
## [5] "Raw Data/New Source (Qfish)/2005.csv"
## [6] "Raw Data/New Source (Qfish)/2006.csv"
## [7] "Raw Data/New Source (Qfish)/2007.csv"
## [8] "Raw Data/New Source (Qfish)/2008.csv"
## [9] "Raw Data/New Source (Qfish)/2009.csv"
## [10] "Raw Data/New Source (Qfish)/2010.csv"
## [11] "Raw Data/New Source (Qfish)/2011.csv"
## [12] "Raw Data/New Source (Qfish)/2012.csv"
## [13] "Raw Data/New Source (Qfish)/2013.csv"
## [14] "Raw Data/New Source (Qfish)/2014.csv"
## [15] "Raw Data/New Source (Qfish)/2015.csv"
## [16] "Raw Data/New Source (Qfish)/2016.csv"
## [17] "Raw Data/New Source (Qfish)/2017.csv"
## [18] "Raw Data/New Source (Qfish)/2018.csv"
## [19] "Raw Data/New Source (Qfish)/2019.csv"
## [20] "Raw Data/New Source (Qfish)/2020.csv"
## [21] "Raw Data/New Source (Qfish)/2021.csv"
## [22] "Raw Data/New Source (Qfish)/2022.csv"
## [23] "Raw Data/New Source (Qfish)/2023 (until March).csv"
Now that we understand how this works, let’s create a variable called “files” that we can use for listing all the csv files in that folder
# List all csv files in the folder
# If you saved the data somewhere else, just edit the path below
files<-list.files('Raw Data/New Source (Qfish)',pattern = ".csv",full.names = TRUE)After we created a the “files” variable, we now need to load all the csv file names listed in this variable into one dataframe. If you remember, based on our initial data observation, we also want to skip the first raw when loading the csv files and use each file’s name to create a ‘year’ column.
Luckily we can do all of the above in a few lines of code. This is a bit complex if you haven’t used much R in the past and you might find the next section complicated. the good news are that is probably the most complex command I used in this tutorial, so if you understand it, great! If not, see my explanation below but don’t worry to much if you don’t understand all the components.
Using the variable “files” we just created, we can use the lapply base function to load all the different csvs into one variable called “Data”. We then create a function which uses the function ‘read.csv’ from tidyverse while skipping the first row. the argument - Year = files[x] - takes each file’s name and adds it to a column called “Year”
# Read and combine csvs
# We use lapply to create a function that reads all csv in the
Data<-lapply(seq_along(files), function(x) transform(read.csv(files[x],skip = 1), Year = files[x]))This function created a list of data frames. Since we are interested in one dataframe that contains all the data rather then a list of multiple data frames. We can convert the list using the rblindlist function:
#Convert datalist to data frame
Data<-rbindlist(Data,fill=TRUE,use.names = TRUE)
#And since we no longer need our old variable, we can delete it
rm(files)We now successfully merged all the csv files into one data frame. If you want to see the dataset you can either open it using the R studio viewer or by using the view function:
#Open the viewer using a command
view(Data)Alternatively, you can use the head command to get a glimpse of the data
#Print the first 10 rows of the data frame
head(Data,10)## SpeciesName LengthRange SpeciesGroup Gender LengthCategory Fate
## 1: Australian Blacktip 0-1m Shark Female < 2m Dead
## 2: Australian Hb Dolphin Non-shark Mammal Female Non-shark Dead
## 3: Australian Hb Dolphin Non-shark Mammal Unknown Non-shark Dead
## 4: Australian Hb Dolphin Non-shark Mammal Total
## 5: Big Nose Whaler 2-3m Shark Female >= 2m Dead
## 6: Blacktip Reef Whaler 0-1m Shark Female < 2m Dead
## 7: Blacktip Reef Whaler 0-1m Shark Female < 2m Dead
## 8: Blacktip Reef Whaler 0-1m Shark Female < 2m Dead
## 9: Blacktip Reef Whaler 0-1m Shark Female < 2m Dead
## 10: Blacktip Reef Whaler 0-1m Shark Female < 2m Dead
## Beach GearStatus GearName
## 1: Kellys Beach Not Active Kellys Beach Drum 0
## 2: Clifton Beach Not Active Clifton Beach Net 0
## 3: Rainbow Beach Active Rainbow Beach Net 1
## 4:
## 5: Coolum Beach Not Active Coolum Beach Net 0
## 6: Buchans Point Beach Not Active Buchans Point Beach Drum 0
## 7: Clifton Beach Not Active Clifton Beach Drum 0
## 8: Palm Cove Not Active Palm Cove Drum 0
## 9: Tannum Sands Not Active Tannum Sands Drum 1
## 10: Yorkeys Knob Active Yorkeys Knob Drum 30
## Area Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 1: Bundaberg NA NA NA NA NA NA NA NA NA 1 NA NA
## 2: Cairns NA NA NA NA NA NA NA NA 1 NA NA NA
## 3: Rainbow Beach NA NA NA NA 1 NA NA NA NA NA NA NA
## 4: NA NA NA NA 1 NA NA NA 1 NA NA NA
## 5: Sunshine Coast North NA NA NA 1 NA NA NA NA NA NA NA NA
## 6: Cairns 1 NA NA NA NA NA NA NA NA NA NA NA
## 7: Cairns 1 NA NA NA NA NA NA NA NA NA NA NA
## 8: Cairns 1 NA NA NA NA NA NA NA NA NA NA NA
## 9: Gladstone NA NA 1 NA NA NA NA NA NA NA NA NA
## 10: Cairns NA NA NA NA NA NA NA NA NA NA NA 1
## Grand.Total Year
## 1: 1 Raw Data/New Source (Qfish)/2001.csv
## 2: 1 Raw Data/New Source (Qfish)/2001.csv
## 3: 1 Raw Data/New Source (Qfish)/2001.csv
## 4: 2 Raw Data/New Source (Qfish)/2001.csv
## 5: 1 Raw Data/New Source (Qfish)/2001.csv
## 6: 1 Raw Data/New Source (Qfish)/2001.csv
## 7: 1 Raw Data/New Source (Qfish)/2001.csv
## 8: 1 Raw Data/New Source (Qfish)/2001.csv
## 9: 1 Raw Data/New Source (Qfish)/2001.csv
## 10: 1 Raw Data/New Source (Qfish)/2001.csv
Now the real fun begins! We need to clean and sort our data so that we can make sense of it later.
Remember we took the name of each csv file and converted it into a column called year - at the moment the this is a long string (text) that contains the path of each file. Let’s extract the Year number and convert it to a numeric column.
#Convert file path string in year column to a number
Data<-Data%>%
mutate(Year=as.numeric(
gsub(".*?([0-9]+).*", "\\1",Year)
)
)And to check the new column
#Print the first 10 rows of the data frame for column "Year" only
head(Data$Year,5)## [1] 2001 2001 2001 2001 2001
If you look at the data you will be able to see that each column has some values that also contain the word “total”. This is because the data originally came from a query on QFISH which produced total per each species per month per year. Since we want to undo this process and create a dataset of all individual animals that were caught, we no longer need these values.
#Remove "Total" cells
Data<-Data[with(Data, !grepl("Total", paste(SpeciesName,LengthRange,SpeciesGroup,Gender,LengthCategory,
Fate,Beach,GearStatus,Area,Jan,Feb,Mar,Apr,May,Jun,
Jul,Aug,Sep,Oct,Nov,Dec))),]
#Remove Grand.Total Column
Data<-Data%>%
select(-Grand.Total)
Have a look at the data again. As you can see, the data is aggregated by months. This is again a result of the data source. In order to “de-aggregate” the data we need to pivot the data from a ‘wide’ format to a ‘long’ format.
source: STATOLOGY, https://www.statology.org/long-vs-wide-data/
Basically what we want to do is take take all month columns (Jan-Dec) and convert them into one column called “Month”. We want to convert the value in each month column into individual rows. To do this I use dplyr’s “pivot_longer”.
#Unpivot the data by months
Data<-Data%>%
pivot_longer(cols = 11:22,
names_to = 'Month',
values_to = 'n',
values_drop_na = TRUE
)Now let’s check the data again:
#Print the first 10 rows of the data frame
head(Data,10)## # A tibble: 10 x 13
## SpeciesName LengthRange SpeciesGroup Gender LengthCategory Fate Beach
## <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 Australian Blackt~ 0-1m Shark Female < 2m Dead Kell~
## 2 Australian Hb Dol~ Non-shark Mammal Female Non-shark Dead Clif~
## 3 Australian Hb Dol~ Non-shark Mammal Unkno~ Non-shark Dead Rain~
## 4 Big Nose Whaler 2-3m Shark Female >= 2m Dead Cool~
## 5 Blacktip Reef Wha~ 0-1m Shark Female < 2m Dead Buch~
## 6 Blacktip Reef Wha~ 0-1m Shark Female < 2m Dead Clif~
## 7 Blacktip Reef Wha~ 0-1m Shark Female < 2m Dead Palm~
## 8 Blacktip Reef Wha~ 0-1m Shark Female < 2m Dead Tann~
## 9 Blacktip Reef Wha~ 0-1m Shark Female < 2m Dead York~
## 10 Blacktip Reef Wha~ 0-1m Shark Female < 2m Dead York~
## # i 6 more variables: GearStatus <chr>, GearName <chr>, Area <chr>, Year <dbl>,
## # Month <chr>, n <int>
You can see that we now have a new column “n” - this is a result of the ‘pivot_longer’ function we used before. Basically the value in ‘n’ is the same value that we originally had for each month column. So if the value for a particular species in Jan 2020 was 4, we now have one row for Jan 2020 where n=4. In order to completely “de-aggregate” the data, we need to ‘explode’ each row into multiple rows based on the value in ‘n’, so that if n=4, we will have 4 rows instead. The following code does that using the uncount function:
#"Unpivot" the data by months
Data<-Data%>%
uncount(weights = n)
It is always a good practice to have a unique ID for each record in a database. There are numerous ways to do this but since we are not going to use these IDs we can just do it the easy way by using the row number
#ad iD Column
Data<-Data%>%
mutate(ID=row_number())%>%
relocate(ID,.before = SpeciesName)
Currently, the “GearName” column is a string that contains 3 pieces of information. for example: “Kellys Beach Drum 0” contains the location name (e.g. “Kellys Beach”), gear type (Drum) and the gear number (0). This is valuable information that we can extract into separate columns:
#Extract drumline and net and gear number
Data<-Data%>%
mutate(Feature=case_when(
str_detect(GearName,"Drum")==TRUE~'Drum Line',
TRUE~'Net'
))%>%
mutate(Gear_Number=extract_numeric(GearName))
#Edit specific values with missing gear values:
substrRight <- function(x, n){
substr(x, nchar(x)-n+1, nchar(x))
}
Data$Gear_Number<-ifelse(is.na(Data$Gear_Number),as.numeric(substrRight(Data$GearName,2)),Data$Gear_Number)
Data$Beach<-gsub('-.*','',Data$Beach)
#remove old columns
Data<-Data%>%
select(-GearName)
Now, something easier. In case we want to do any date related analysis later, we need a date column. Unfortunately we don’t have enough information to create a column with the exact date. But since we have the month and the year, we can create a date column by adding the first day of each month. Note, this is not accurate for analysis that requires an exact date, but for visualising change over time (even on a monthly basis) it is better to have a date column in the data.
#Create Date Column
Data<-Data%>%
mutate(Date=dmy(paste0('01-',Month,'-',Year)))
This task was quite manual and I originally found the need to do it while trying to join this database to another database that specifies the location of each drumline/net. What I found was that the same location (beach) sometimes has different names or spelling. This means that any discrepancies in the data will cause an error when trying to match the data.
If you want to have a look yourself, here’s an example:
Data%>%
filter(str_detect(Beach,'Bargara')==TRUE)%>%
distinct(Beach)## # A tibble: 2 x 1
## Beach
## <chr>
## 1 Bargara
## 2 Bargara Beach
These are the same place spelled differently… If you want to find more cases, you can use the distinct function and list them all. Luckily I’ve done the job for you and you can just run the code below.
#Fix duplicated names
Data$Beach[Data$Beach=='Bargara']<-'Bargara Beach'
Data$Beach[Data$Beach=='Caloundra']<-'Caloundra Beach'
Data$Beach[Data$Beach=='Currimundi']<-'Currimundi Beach'
Data$Beach[Data$Beach=='Hyatt Regency Resort']<-'Hyatt Resort'
Data$Beach[Data$Beach=='Mooloolaba']<-'Mooloolaba Beach'
Data$Beach[Data$Beach=='Nobby Beach']<-'Nobbys Beach'
Data$Beach[Data$Beach=='Oaks Beach']<-'Oak Beach'
Data$Beach[Data$Beach=='Sheraton Mirage Beach']<-'Sheraton Mirage'
Data$Beach[Data$Beach=='Buchans']<-'Buchans Point Beach'
Data$Beach[Data$Beach=='Elkhorn Avenue']<-'Elkhorn Ave'
Data$Beach[Data$Beach=='Clifton']<-'Clifton Beach'
Data$Beach[Data$Beach=='Maroochydore Beach']<-'Maroochydore'
Data$Beach[Data$Beach=='Mullambin Beach']<-'Mulambin Beach'
Data$Beach[Data$Beach=='Narrowneck Beach']<-'Narrowneck'
Data$Beach[Data$Beach=='North Burleigh']<-'North Burleigh Beach'
Data$Beach[Data$Beach=='Northcliffe Beach']<-'Northcliffe'
Data$Beach[Data$Beach=='Palarenda Beach']<-'Pallarenda Beach'
Data$Beach[Data$Beach=='South Lamberts']<-'Sth Lamberts Beach'
Data$Beach[Data$Beach=='Staghorn Avenue']<-'Staghorn Ave'
Data$Beach[Data$Beach=='Surfers Paradise']<-'Surfers Paradise Beach'
Data$Beach[Data$Beach=='Trinity']<-'Trinity Beach'
Data$Beach[Data$Beach=='Wurtulla Beach']<-'Wurtulla'
Data$Beach[Data$Beach=='Yeppoon Beach']<-'Yeppoon'
Data$Beach[Data$Beach=='Yorkeys']<-'Yorkeys Knob'
Data$Beach[Data$Beach=='Holloways']<-'Holloways Beach'
Data$Beach[Data$Beach=='Narrowneck']<-'Narrow Neck'
Data$Beach[Data$Beach=='Smart 507 Mullambin Beach']<-'Mullambin Beach'
Data$Beach[Data$Beach=='Smart 555 Tanby Point']<-'Tanby Point'
Data$Beach[Data$Beach=='Smart 597 Emu Park']<-'Emu Park'
Data$Beach[Data$Beach=='Smart 575 Fishermans Beach']<-'Fishermans Beach'
Data$Beach[Data$Beach=='Smart 552 Tanby Point']<-'Tanby Point'
Data$Beach[Data$Beach=='Smart 606 Emu Park']<-'Emu Park'
Data$Beach[Data$Beach=='Smart 507 Mullambin Beach']<-'Mullambin Beach'
Data$Beach[Data$Beach=='Smart 576 Fishermans Beach']<-'Fishermans Beach'
Data$Beach[Data$Beach=='Smart 519 Mullambin Beach']<-'Mullambin Beach'
Data$Beach[Data$Beach=='Smart 577 Fishermans Beach']<-'Fishermans Beach'
Data$Beach[Data$Beach=='Smart 605 Emu Park']<-'Emu Park'
Data$Beach[Data$Beach=='Smart 577 Fishermans Beach']<-'Fishermans Beach'
Data$Beach[Data$Beach=='Smart 577 Emu Park']<-'Emu Park'
Data$Beach[Data$Beach=='Mullambin Beach']<-'Mulambin Beach'
Data$Beach[Data$Beach=='Smart 520 Tanby Point']<-'Tanby Point'
Data$Beach[Data$Beach=='Smart 520 Mullambin Beach']<-'Mulambin Beach'
Data$Beach[Data$Beach=='Yaroomba']<-'Yaroomba Beach'
#Change Area Names to fit coordinate dataset (joined in the next step)
Data$Area<-ifelse(Data$Beach=='Caloundra Beach','Sunshine Coast South & Bribie Island',Data$Area)
Data$Area<-ifelse(Data$Beach=='Currimundi Beach','Sunshine Coast South & Bribie Island',Data$Area)
Data$Area<-ifelse(Data$Beach=='Hyatt Resort','Sunshine Coast North',Data$Area)
Data$Area<-ifelse(Data$Beach=='Mooloolaba Beach','Sunshine Coast South & Bribie Island',Data$Area)
Data$Area<-ifelse(Data$Beach=='Yaroomba Beach','Sunshine Coast North',Data$Area)
Data$Area<-ifelse(Data$Beach=='Point Cartwright','Sunshine Coast South & Bribie Island',Data$Area)
Data$Area<-ifelse(Data$Beach=='Wurtulla','Sunshine Coast South & Bribie Island',Data$Area)
Data$Area<-ifelse(Data$Beach=='Moffat Beach','Sunshine Coast South & Bribie Island',Data$Area)
Data$Area<-ifelse(Data$Beach=='Alexandra Headland','Sunshine Coast South & Bribie Island',Data$Area)
Data$Area<-ifelse(Data$Beach=='Buddina Beach','Sunshine Coast South & Bribie Island',Data$Area)
To make the data more interesting, let’s add a location attribute so that we can map the data as well later on.
The first thing we need to to is to load the new data
#Read Gear Locations csv
GearLocations<-read.csv("Raw Data/Gear_Locations.csv",header = TRUE)Now let’s look at the data
head(GearLocations)## Beach Location Feature Gear_Number
## 1 Sunshine Coast South & Bribie Island Alexandra Headland Net 7
## 2 Sunshine Coast South & Bribie Island Alexandra Headland Net 8
## 3 Townsville Alma Bay Drum Line 0
## 4 Townsville Alma Bay Drum Line 15
## 5 Townsville Alma Bay Drum Line 16
## 6 Townsville Alma Bay Drum Line 17
## Accuracy x y
## 1 153.1151 -26.66671
## 2 153.1176 -26.66990
## 3 Approximate Location 146.8740 -19.14989
## 4 146.8744 -19.15187
## 5 146.8739 -19.15070
## 6 146.8739 -19.15032
Which columns does this data frame has in common with our data frame? Which columns are additional? If this was a a bigger data frame with 50+ columns than we would probably check this out using different functions that are available in R. However since we have a relatively small number of columns. We can easily see that except for the x and y columns (=coordinates), all other columns can be seen in both data frames.
Since each row has a unique set of coordinates, we need to use all common columns to correctly match between our data frame and the location data frame. This is done by specifying multiple columns in the join function.
#Change column names in new data frame so that they match
GearLocations<-GearLocations%>%
rename(Area=Beach,Beach=Location)
#Join new gear location data to the dataframe
Data<-left_join(Data,GearLocations,by=c('Beach','Area','Feature','Gear_Number'))
Before we define targeted shark species in our data, there is a small fix that needs to be applied first on the species name column. Similar to the issue with the beach name column, I spotted this issue while working with the data.
#fix some species names
Data$SpeciesName<-ifelse(Data$SpeciesName=='Common Blacktip Wha','Common Blacktip Whaler',Data$SpeciesName)
Data$SpeciesName<-ifelse(Data$SpeciesName=='Eagle Ray *','Eagle Ray',Data$SpeciesName)
Data$SpeciesName<-ifelse(Data$SpeciesName=='Hammerhead Shark *','Hammerhead Shark',Data$SpeciesName)
Data$SpeciesName<-ifelse(Data$SpeciesName=='Wobbegong *','Wobbegong',Data$SpeciesName)
Data$SpeciesName<-ifelse(Data$SpeciesName=='Devilray *','Devilray',Data$SpeciesName)
Data$SpeciesName<-ifelse(Data$SpeciesName=='Shovelnosed Ray *','Shovelnosed Ray',Data$SpeciesName)Similar to the location data we need to load the new data and join it to our dataframe
#Load targeted species dataset
targeted_sharks<-read.csv('Raw Data/Targeted_Shark_Species.csv',header = TRUE)
#Change column name to be identical as main data
targeted_sharks<-targeted_sharks%>%
rename(SpeciesName=Name)
#Join Targeted Shark Dataset to main dataset
Data<-left_join(Data,targeted_sharks,by="SpeciesName")
#Create value for all non-targeted species
Data$Targeted<-ifelse(is.na(Data$Targeted),'Non_Targeted',Data$Targeted)At the time of writing this tutorial, 2023 is not over yet, so To avoid combining any incomplete data, I filtered the data out. If you downloaded the 2023 data, then feel free to skip this step.
Data<-Data%>%
filter(Year!=2023)
That’s it the data is ready!! If you want, you can export the data frame as a csv and use it in any data visualisation platform you like.
write_csv(Data,'Data.csv')Now it’s time to start visualising all this data with R!
Let’s start with basic calculations to get an overview of the data such as how many animals where caught in the program (for the years we have data for)? How many of them are sharks compared to other species? What are the percentages for shark vs non-shark?
I personally think that the easiest way to answer these kind of questions is with ‘dplyr’ and another great library that makes life much easier is ‘janitor’, let’s check it out.
Using dplyr:
#Total animals caught
Data%>%
summarise(total=n())## # A tibble: 1 x 1
## total
## <int>
## 1 17470
#since we know that each row represents 1 animal, we can also just count the total number of rows using
nrow(Data)## [1] 17470
#Total animals caught by species group
#using dplyr
Data%>%
#this will give us the total number per species group
group_by(SpeciesGroup)%>%
summarise(total=n())%>%
#and this to get the percentages per each group
ungroup()%>%
mutate(percent=total/sum(total)*100)## # A tibble: 4 x 3
## SpeciesGroup total percent
## <chr> <int> <dbl>
## 1 Mammal 438 2.51
## 2 Other 2399 13.7
## 3 Shark 13674 78.3
## 4 Turtle 959 5.49
Alternatively, we can just use the ‘janitor’ library with a much simpler code
Data%>%
tabyl(SpeciesGroup)%>%
mutate(percent=percent*100)%>%
arrange(desc(percent))## SpeciesGroup n percent
## Shark 13674 78.271322
## Other 2399 13.732112
## Turtle 959 5.489410
## Mammal 438 2.507155
Using dplyr and tabyl we can create many easy function to get some basic statics of the data. Here are a few more examples:
#sharks and non-sharks (grouping all other categories together)
Data%>%
mutate(shark=case_when(
SpeciesGroup=='Shark'~'shark',
TRUE~'non-shark'
))%>%
tabyl(shark)%>%
mutate(percent=percent*100)## shark n percent
## non-shark 3796 21.72868
## shark 13674 78.27132
#targeted species vs non- targeted
Data%>%
tabyl(Targeted)## Targeted n percent
## Non_Targeted 9586 0.5487121
## Targeted 7884 0.4512879
#targeted species vs non- targeted
Data%>%
tabyl(Targeted,Fate)%>%
adorn_percentages()## Targeted Alive Dead
## Non_Targeted 0.3054454 0.6945546
## Targeted 0.2949011 0.7050989
#Total survival rate all
Data%>%
tabyl(Fate)## Fate n percent
## Alive 5253 0.3006869
## Dead 12217 0.6993131
#species and fate
Data%>%
tabyl(SpeciesGroup,Fate)%>%
adorn_percentages()## SpeciesGroup Alive Dead
## Mammal 0.3287671 0.6712329
## Other 0.4747812 0.5252188
## Shark 0.2299985 0.7700015
## Turtle 0.8602711 0.1397289
#top 10 most caught species
Data%>%
group_by(SpeciesName)%>%
summarise(count=n(),Targeted=Targeted)%>%
distinct()%>%
ungroup()%>%
slice_max(count,n=10)## # A tibble: 10 x 3
## SpeciesName count Targeted
## <chr> <int> <chr>
## 1 Tiger Shark 4920 Targeted
## 2 Bull Whaler 2261 Targeted
## 3 Long Nose Whaler 1122 Non_Targeted
## 4 Blacktip Reef Whaler 1093 Non_Targeted
## 5 Cownose Ray 732 Non_Targeted
## 6 Loggerhead Turtle 672 Non_Targeted
## 7 Scalloped Hammerhead 623 Non_Targeted
## 8 Tawny Shark 602 Non_Targeted
## 9 Spot-Tail Whaler 590 Non_Targeted
## 10 Pigeye Whaler 384 Non_Targeted
It’s time for something a bit more complex
#most caught species
Data%>%
group_by(SpeciesName)%>%
summarise(count=n(),Targeted=Targeted)%>%
distinct()%>%
ungroup()%>%
slice_max(count,n=10)## # A tibble: 10 x 3
## SpeciesName count Targeted
## <chr> <int> <chr>
## 1 Tiger Shark 4920 Targeted
## 2 Bull Whaler 2261 Targeted
## 3 Long Nose Whaler 1122 Non_Targeted
## 4 Blacktip Reef Whaler 1093 Non_Targeted
## 5 Cownose Ray 732 Non_Targeted
## 6 Loggerhead Turtle 672 Non_Targeted
## 7 Scalloped Hammerhead 623 Non_Targeted
## 8 Tawny Shark 602 Non_Targeted
## 9 Spot-Tail Whaler 590 Non_Targeted
## 10 Pigeye Whaler 384 Non_Targeted
And more examples:
#Top locations where any species is caught
Data%>%
group_by(Beach)%>%
summarise(count=n())%>%
distinct()%>%
ungroup()%>%
slice_max(count,n=10)## # A tibble: 10 x 2
## Beach count
## <chr> <int>
## 1 Rainbow Beach 1590
## 2 Tannum Sands 1058
## 3 Kellys Beach 760
## 4 Noosa 719
## 5 Horseshoe Bay 707
## 6 Harbour Beach 681
## 7 Ellis Beach 481
## 8 Bucasia Beach 464
## 9 Florence Bay 449
## 10 Neilson Park 440
#Top locations where most un-targeted species are caught
Data%>%
group_by(Beach,Targeted)%>%
summarise(n=n())%>%
pivot_wider(names_from = Targeted,values_from = n)%>%
mutate(Total=sum(Targeted,Non_Targeted))%>%
arrange(desc(Non_Targeted))## # A tibble: 91 x 4
## # Groups: Beach [91]
## Beach Non_Targeted Targeted Total
## <chr> <int> <int> <int>
## 1 Rainbow Beach 1034 556 1590
## 2 Tannum Sands 557 501 1058
## 3 Noosa 530 189 719
## 4 Horseshoe Bay 382 325 707
## 5 Bucasia Beach 283 181 464
## 6 Ellis Beach 244 237 481
## 7 Currumbin Beach 229 20 249
## 8 Burleigh Beach 220 16 236
## 9 Bilinga Beach 219 20 239
## 10 Kellys Beach 210 550 760
## # i 81 more rows
Alternatively you could use something as short as:
Data%>%
tabyl(Beach,Targeted)#Top locations where most un-targeted species are caught, but this time let's use percentages out of total
Data%>%
tabyl(Beach,Targeted)%>%
group_by(Beach)%>%
mutate(Total=sum(Non_Targeted,Targeted))%>%
mutate(Non_Targered_=Non_Targeted/Total*100,
Targered_=Targeted/Total*100)%>%
summarise(Non_Targered_,Targered_)%>%
arrange(desc(Non_Targered_))## # A tibble: 91 x 3
## Beach Non_Targered_ Targered_
## <chr> <dbl> <dbl>
## 1 Mermaid Beach 94.1 5.93
## 2 Miami Beach 93.3 6.71
## 3 Burleigh Beach 93.2 6.78
## 4 Coolum Beach 93.0 7.02
## 5 Alexandra Headland 92.6 7.45
## 6 Kurrawa Beach 92.5 7.48
## 7 Currumbin Beach 92.0 8.03
## 8 Coolangatta Beach 91.8 8.21
## 9 Bilinga Beach 91.6 8.37
## 10 Kirra Beach 91.0 9.00
## # i 81 more rows
I was personally very curios to see if there is a correlation between the use of nets and animal death. In other words, are nets more lethal than drumlines and are they effective for catching targeted species?
#correlation net vs death rate
cor_net_death<-Data%>%
#aggregating by beach and feature
group_by(Beach,Feature)%>%
summarise(n=n())%>%pivot_wider(names_from=c(Feature),values_from=n)%>%
replace(is.na(.),0)%>%
#join to same dataset aggregated by beach and targeted
left_join(Data%>%
group_by(Beach,Fate)%>%
summarise(n=n())%>%pivot_wider(names_from=c(Fate),values_from=n)%>%
replace(is.na(.),0),by="Beach")%>%
mutate(total=Dead+Alive)%>%
#creating proportions for correlation
mutate(prop_dead=Dead/total,prop_net=Net/total,prop_alive=1-prop_dead)%>%
#removing irrelevant values
filter(prop_net<1 & prop_net>0)
#Pearson's correlation test
cor(cor_net_death$prop_net,cor_net_death$prop_dead)## [1] 0.6694398
The correlation coefficient is ~0.7 which indicates a strong correlation between the proportion of deaths and the proportion of net usage. In other words, the more animals caught in a location by net, the higher the possibility that their fate will be death.
#correlation net vs non-targeted
cor_net_target<-Data%>%
group_by(Beach,Feature)%>%
summarise(n=n())%>%pivot_wider(names_from=c(Feature),values_from=n)%>%
replace(is.na(.),0)%>%
left_join(Data%>%
group_by(Beach,Targeted)%>%
summarise(n=n())%>%pivot_wider(names_from=Targeted,values_from=n)%>%
replace(is.na(.),0),by="Beach")%>%
mutate(total=sum(Targeted,Non_Targeted))%>%
mutate(prop_non_targeted=Non_Targeted/total,prop_net=Net/total,prop_targeted=1-prop_non_targeted)%>%
filter(prop_net<1 & prop_net>0)
#Pearson's correlation test
cor(cor_net_target$prop_net,cor_net_target$prop_non_targeted)## [1] 0.7878282
The correlation coefficient is ~0.8 which indicates a very strong correlation between the proportion of and of net usage and the proportion of non-targeted species caught. In other words, the more animals caught in a location by net, the higher the possibility that they will be non-targeted species.
Try to think of other analysis you could do? More correlations? Regressions? Get creative, there are so many possibilities !
# Plotting Style
PlotStyle<-theme(plot.title = element_text(family = "Arial", face = "bold", size = (15),hjust = 0.5),
legend.title = element_text(colour = 'black', face = "bold.italic", family = "Arial"),
legend.text = element_text(face = "italic", colour='black',family = "Arial"),
axis.title = element_text(family = "Arial", size = (10), colour = "black"),
axis.text = element_text(family = "Courier", colour = "black", size = (10)))
# Colour Palette
colors<- c('#3b4395','#00a0a0','navy','#800000','#09AD50','#7B0CB0','#B05B05','#AD0980','#17283B','#AC482B','#D42415','#2DE6FA','#80DE1D','#2156FF','#3D4F80','#FF73EA')
#should have total animals caught and survival rate over the years (2 sided y bar)
Data%>%
group_by(Year,Fate)%>%
summarise(n=n())%>%
pivot_wider(names_from = Fate,values_from = n)%>%
mutate(Total=sum(Dead,Alive))%>%
mutate(perc_alive=Alive/Total,perc_dead=Dead/Total)%>%
ggplot(aes(x=Year,y=Total))+
geom_col(aes(y=Total),alpha=0.8,fill='#0B9EB8')+
geom_line(aes(y=perc_dead*1000),size=1,colour='#6B1111')+
scale_y_continuous(
name = 'Total Caught',
sec.axis = sec_axis(~./1000,name = 'Death Rate',labels = scales::percent ))+
scale_x_continuous(breaks = seq(2001,2022,by=2))+
geom_text(aes(label = Total, y = Total-40),angle=90) +
PlotStyle+
theme(
axis.text.y.right = element_text(colour='#6B1111'),
axis.title.y.right = element_text(colour='#6B1111')
)+
labs(title = 'Total Caught and Death Rate')Data%>%
group_by(Year,Targeted,Fate)%>%
summarise(n=n())%>%
mutate(n=case_when(Fate=='Dead'~-n,TRUE~n))%>%
mutate(Targeted=replace(Targeted, Targeted=='Non_Targeted', 'Non Targeted'))%>%
mutate(Group=paste(Fate,'-',Targeted))%>%
group_by(Year,Fate)%>%
mutate(Total=sum(n))%>%
ungroup()%>%
arrange(Year,Fate,n)%>%
ggplot(aes(x=Year,y=n,fill=Group))+
geom_bar(stat="identity", position="stack",alpha=0.7)+
PlotStyle+
scale_fill_manual(values =c('#066B23','#30B856','#B83030','#6B1111'))+
scale_y_continuous(labels = abs,limits=c(-700,700))+
scale_x_continuous(breaks = seq(2001,2023,by=2))+
geom_text(data = ~subset(., Fate=='Alive'), aes(label = Total, y = Total),angle=90,hjust=-0.1) +
geom_text(data = ~subset(., Fate=='Dead'), aes(label = -Total, y = Total),angle=90,hjust=1)+
labs(title = 'Dead or Alive?',x='Year',y='Total Number by Fate (Dead/Alive)')
Data%>%
group_by(SpeciesGroup,Year,Feature,Fate)%>%
summarise(n=n())%>%
pivot_wider(values_from = n,names_from = Fate)%>%
replace(is.na(.), 0)%>%
group_by(SpeciesGroup,Feature,Year)%>%
mutate(Percentage_Alive=Alive/sum(Alive,Dead),Percentage_Dead=Dead/sum(Alive,Dead))%>%
group_by(SpeciesGroup,Feature)%>%
summarise(ave_dead=mean(Percentage_Dead),ave_dead=mean(Percentage_Dead))%>%
mutate(SpeciesGroup=fct_reorder(SpeciesGroup,desc(ave_dead)))%>%
ggplot(aes(x=SpeciesGroup,y=ave_dead,fill=SpeciesGroup))+
geom_col(alpha=0.7)+
coord_flip()+
PlotStyle+
theme(
legend.position = 'none')+
# scale_fill_manual(values =c('#6B1111','#6BB830','#0C6B63','#0BB8AA','#0C6B63'))+
scale_fill_manual(values =c('#0C6B63','#0BB8AA','#6B1111','#6BB830'))+
scale_y_continuous(labels = scales::percent ,limits=c(0,1))+
geom_text( aes(label = paste0(round(ave_dead,2)*100,'%'), y = ave_dead),hjust=-0.1) +
labs(title = 'Death Rate by Species Group and Feature',x='Species Group',y='Average Death Rate')+
facet_wrap(~Feature)
Data%>%
group_by(SpeciesName,SpeciesGroup,Targeted,Fate)%>%
summarise(n=n())%>%
pivot_wider(values_from = n,names_from = Fate)%>%
replace(is.na(.), 0)%>%
filter(Dead+Alive>=25)%>%
group_by(SpeciesName,SpeciesGroup)%>%
summarise(Targeted=Targeted,Percentage_Alive=Alive/sum(Alive,Dead),Percentage_Dead=Dead/sum(Alive,Dead))%>%
arrange(desc(SpeciesGroup),Percentage_Dead)%>%
ungroup()%>%
mutate(num=row_number())%>%
mutate(Targeted=replace(Targeted, Targeted=='Non_Targeted', 'Non Targeted'))%>%
mutate(SpeciesName=fct_reorder(SpeciesName,num))%>%
ggplot(aes(x=SpeciesName,y=Percentage_Dead))+
geom_segment(aes(x = SpeciesName, xend=SpeciesName,y=0,yend=Percentage_Dead,colour = SpeciesGroup),
position = position_dodge(width = 1),size=1)+
geom_point(aes(colour=SpeciesGroup,shape=Targeted),size = 7)+
coord_flip()+
scale_color_manual(values =c('#0C6B63','#0BB8AA','#6B1111','#6BB830'))+
geom_text(aes(label = round(Percentage_Dead,2)*100 ), color = "white", size = 3)+
PlotStyle+
scale_y_continuous(labels = scales::percent ,limits=c(0,1))+
labs(title = 'Death Rate by Species (n>=25)',y='Death Rate',x='Species')
Data%>%
mutate(Targeted=str_replace(Targeted,'Non_Targeted','Non-Targeted'))%>%
group_by(Year)%>%
ggplot(aes(Year, after_stat(count), fill = Targeted)) +
geom_density(position = "fill",alpha=.7)+
scale_y_continuous(labels = scales::percent ,limits=c(0,1))+
scale_x_continuous(breaks = seq(2001,2023,by=2))+
PlotStyle+
scale_fill_manual(values =c('#2156FF','#D42415'))+
scale_y_continuous(labels = scales::percent ,limits=c(0,1))+
scale_x_continuous(breaks = seq(2001,2022,by=2),guide = guide_axis(check.overlap = TRUE))+
labs(title = 'Percentage of Targeted VS Non-Targeted Species \n by Feature',x='Year',y='Percent of Animals Caught')+
facet_wrap(~Feature)
Data%>%
mutate(Targeted=str_replace(Targeted,'Non_Targeted','Non-Targeted'))%>%
mutate(Group=paste(Targeted,'-',Fate))%>%
group_by(Year)%>%
ggplot(aes(Year, after_stat(count), fill = Group)) +
geom_density(position = "fill",alpha=.7)+
scale_y_continuous(labels = scales::percent ,limits=c(0,1))+
scale_x_continuous(breaks = seq(2001,2023,by=2))+
PlotStyle+
scale_fill_manual(values =c('#14C7FF','#137DE8','#B83030','#6B1111'))+
scale_y_continuous(labels = scales::percent ,limits=c(0,1))+
scale_x_continuous(breaks = seq(2001,2022,by=2),guide = guide_axis(check.overlap = TRUE))+
labs(title = 'Percentage of Targeted VS Non-Targeted Species \n by Feature and Fate',x='Year',y='Percent of Animals Caught')+
facet_wrap(~Feature)cor_net_death%>%
ggplot(aes(x=prop_net,y=prop_dead))+
geom_point(position ='jitter',colour='#00a0a0',size=1.5)+
PlotStyle+
stat_smooth(geom='line',alpha=0.8,method='lm',formula = 'y~x',se=FALSE,colour='navy',linewidth=1)+
stat_cor(method = 'pearson',label.x = 0, label.y = 1,p.accuracy = 0.001, r.accuracy = 0.01)+
labs(title = "Correlation Analysis",x='Proportion of Net Usage',y='Proportion of Deaths')cor_net_target%>%
ggplot(aes(x=prop_net,y=prop_non_targeted))+
geom_point(position ='jitter',colour='#00a0a0',size=1.5)+
PlotStyle+
stat_smooth(geom='line',alpha=0.8,method='lm',formula = 'y~x',se=FALSE,colour='navy',linewidth=1)+
stat_cor(method = 'pearson',label.x = 0, label.y = 1,p.accuracy = 0.001, r.accuracy = 0.01)+
labs(title = "Correlation Analysis",x='Proportion of Net Usage',y='Proportion of Non-targeted Species')
Starting with an important note. Even though the following section does not include any complex GIS analysis, if you have never worked with location data before, I strongly recommend you to go online and look at other resources that explain the topic more thoroughly. Any simple Google search will do the trick, otherwise a good starting point can be “Geocomputation with R” - https://r.geocompx.org/index.html.
Working with location data requires some GIS (Geographic Information Systems) libraries to be installed and loaded. Theoretically, R-base and some other data analysis libraries have some basic functions for analysing and visulaising geographic data, however I find the easiest way is to use GIS libraries that are specifically made for that purpose.
To begin, install the following packages:
#Installing Packages
install.packages('sf')
install.packages('tmap')
install.packages('ggspatial')And then load them:
#Installing Packages
library('sf')
library('tmap')
library('ggspatial')As you saw, our dataframe includes coordinates (X,Y) for each gear location. In order to visualise the data geographically, we must first define the columns with the geographic attributes. Generally, there are different types of coordinate reference systems (CRS) - I won’t get into that topic- but what you need to know is that our data is stored in the same CRS as Google Maps which is called CRS:4326.
Before we get into complex visualistations, let’s start with the basics.
First check, do we have location attributes for the entire dataset? Or in other words, are there any missing coordinates? (it’s enough if we just check x or y to get the answer)
#check for any N/As in column x
any(is.na(Data$x))## [1] TRUE
This returned a true value, which means yes, we do have some missing values. As you know, there a different ways to impute missing data,However to keep things simple, let’s just ignore the rows with missing coordinates. Second check - Are the coordinate values good and match the “real world” ? And the best way to check this is by visualising the data!
Note that there are endless ways you can do this either with ggplot, leaflet, tmap, mapdeck, etc.
With ggplot (without converting the dataframe):
#Create a shape of Australia using ggplot
au_shape<-map_data("world")%>%filter(region=='Australia')
Data%>%
filter(!is.na(x))%>% #filtering out N/A coordinates that can't be visualised
ggplot()+
geom_polygon(data=au_shape,aes(x=long,y=lat,group=group),color='black',fill='white')+
geom_point(aes(x=x,y=y),color='red')With ggplot and converting the dataframe (better practice)
Data%>%
filter(!is.na(x))%>% #filtering out N/A coordinates that can't be visualised
st_as_sf(coords = c('x','y'),crs=4326)%>% #converting the data into an sf (geographic) dataframe
ggplot()+
annotation_map_tile("osm")+ #calling for open street maps (OSM) basemap
geom_sf()If you want to do that with tmap, use the code below. This might take some time to load since we have a lot of points in different locations. the nice thing about tmap though is that you can easily create interactive maps without setting to many parameters. I personally think that tmap is usually the easiest way to do it if you don’t have too much spatial data, but feel free to use your preferred method.
#set tmap to interactive map view
tmap_mode('view')
Data%>%
filter(!is.na(x))%>% #filtering out N/A coordinates that can't be visualised
st_as_sf(coords = c('x','y'),crs=4326)%>% #converting the data into an sf (geographic) dataframe
tm_shape()+
tm_dots()Ok check done, the data is located in the right place along the coast of Queensland Australia! Now let’s create something nicer!
Here are three options you can use as a starting point:
Currently, each row is represented by a set of coordinates. We have many different points but not so many different locations. Let’s look at a our first location (alphabetically):
Data%>%
arrange(Beach)%>%
select(c(SpeciesName,Beach,x,y))%>%
head(10)## # A tibble: 10 x 4
## SpeciesName Beach x y
## <chr> <chr> <dbl> <dbl>
## 1 Common Dolphin Alexandra Headland 153. -26.7
## 2 Long Nose Whaler Alexandra Headland 153. -26.7
## 3 Long Nose Whaler Alexandra Headland 153. -26.7
## 4 Manta Ray Alexandra Headland 153. -26.7
## 5 Manta Ray Alexandra Headland 153. -26.7
## 6 Pigeye Whaler Alexandra Headland 153. -26.7
## 7 Ray Alexandra Headland 153. -26.7
## 8 Scalloped Hammerhead Alexandra Headland 153. -26.7
## 9 White Shark Alexandra Headland 153. -26.7
## 10 Devilray Alexandra Headland 153. -26.7
As you can see, the x and y values are almost/completely identical between each row. This is because each row represents a catch based on the gear location. This means that if we aggregate the data by beach location, we will have one point per location. This can be done since the exact location of the beach is not super important in this analysis. So it’s possible to just aggregate the coordinates using the average x and y for each beach location. The good thing is, we can also use this average to impute the NA coordinates that we detected previously.
#creating new dataframe with aggregated locations
data_location<-Data%>%
group_by(Beach,Area)%>%
summarise(x=mean(x,na.rm=T),y=mean(y,na.rm=T))%>%
filter(!is.na(x))%>% #in case any NAs are left
st_as_sf(coords = c('x','y'),crs=4326) Now let’s inspect the data using an tmap’s interactive map
#set tmap to interactive map mode
tmap_mode('view')
#set the basemap
tmap_options(basemaps = 'CartoDB.Positron')
#create map
tm_shape(data_location)+
tm_dots()Now let’s merge some interesting data and create a nice map
#create subset data to join
location_stats<-Data%>%
group_by(Beach,Area,Targeted,Fate)%>%
summarise(n=n(),)%>%
pivot_wider(names_from=c(Targeted,Fate),values_from=n)%>%
replace(is.na(.),0)%>%
group_by(Beach,Area)%>%
mutate(total = rowSums(across(where(is.numeric))),
total_targeted=Targeted_Alive+Targeted_Dead,
total_non_targeted=Non_Targeted_Alive+Non_Targeted_Dead)
#join to location dataframe
data_location<-data_location%>%
left_join(location_stats,by=c('Beach','Area'))
#remove extra dataframe
rm(location_stats)And visualising this on a map with some interesting statistics:
tm_shape(data_location%>%
mutate(survival_rate_non_tar=round(Non_Targeted_Alive/total_non_targeted*100,2)))+
tm_dots(col='survival_rate_non_tar',
size='total_non_targeted',
border.col='black',
border.alpha = .5,
alpha=0.5,
palette="RdYlBu",
contrast=1,
title = 'Survival Rate (%)',
popup.vars=c("Total Non Targeted Caught"="total_non_targeted", "Survival Rate"="survival_rate_non_tar"),
style="fixed", breaks=c(seq(10,100,by=10)))option 2:
tm_shape(data_location%>%
mutate(targeted_rate=round(total_targeted/total*100,2)))+
tm_dots(col='targeted_rate',
size='total',
border.col='black',
border.alpha = .5,
alpha=0.5,
palette="BuPu",
contrast=1,
title = '% of Targeted Species',
popup.vars=c("Total Caught"="total", "Ratio of Targeted Species"="targeted_rate"),
style="fixed", breaks=c(seq(10,100,by=10)))It is also possible to use the original version of the data (without aggregating by location) to create nice maps.
For example, heatmaps!
install.packages('leaflet')
install.packages('leaflet.extras')library('leaflet')
library('leaflet.extras')
data_location%>%
mutate(x = unlist(map(geometry,1)),
y = unlist(map(geometry,2)))%>%
st_drop_geometry()%>%
leaflet()%>%
addTiles() %>%
addProviderTiles(providers$CartoDB.Positron) %>%
setView(153.44513560058888,-28.02118802627169,7)%>%
addHeatmap(lng=~x,lat=~y,max=500,intensity = ~total_targeted,radius=10,blur=5)